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(54) Video skimming system utilizing the vector rank filter 



{57} Automated summa J 3 

quences is accomplished using a vector rank filter (70). 
The consecutive frames of a digital video sequence can 
be represented as feature vectors which are succes- 
sively accumulated in a set of vectors. The distortion of 
the set by the addition of each successive vector or the 
cumulative distance from each successive vector to all 
other vectors in thi- set is d 1 ad by a vector rank 
fitter (70). When tl a 3 -ho d value 

he end of e rideo s it I ected Each frame irt 
a video segment can be ranked according to its relative 



similarity to lite other frames of the set by applying the 
vector rank filter (70) to the feature vectors representing 
the video frames. To produce a summary ot a video se- 
quence which Is most representative of the content of 
the sequence, frames are chosen that correspond to 
vectors that are the least distant to or produce the ieast 
distortion of theset of vectors representing the segment. 
The ranking of the reiative distortion can be used as the 
basis for selecting more than one frame from each seg- 
ment to produce a hierarchy of summaries containing 
greater numbers of the frames having the most repre- 
sentative content. 
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(57) Automated summarization of digital video 
sequences is accomplished using a vector rank filter 
(70). The consecutive frames of a digitai video 
sequence can be repr esenfed as feature vectors which 
are successively accumulated in a set of vectors. The 
distortion of the set by the addition of each successive 
vector or the cumulative distance from each successive 
vector to ali other vectors in tie set is determined by a 
vector rank filter (?G). When the distortion exceeds a 
threshold value the end of a video segment is detected. 
Each frame in a video segment can be ranked accord- 
ing to its relative similarity to the other frames of the set 



by appiying the vector rankfiiter (70) to the feature vec- 
tors representing the video frames. To produce a sum- 
mary of a video sequence which is most representative 
of the content of the sequence, frames are chosen that 
correspond to vectors that are the least distant to or pro- 
duce the least distortion of the set of vectors represent- 
ing the segment The ranking ot the relative distortion 
can be used as the basis for seiecting more than one 
frame from each segment to produce a hierarchy of 
summaries containing greater numbers of the frames 
having the most representative content. 
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Description 

BACKGROUND OF THE INVENTION 

s [0001 ] The present invention relates to digital video content anafysis and more particularly to a system for summa- 
rizing digital video sequences as a series of representative key frames, 

{0OO23 The increasing availability and use of video have created a need for video summaries and abstractions to 
aid users i n effective and efficient browsing of potentially thousands of hours of video. Automation of video content anal- 
ysis and extraction of hey representative content to create summaries has increased in signif icance as video has 

io evolved from an analog to a digital format. Digital television, digital video iibraties, and the Internet are applications 
where an appliance that can "view" the video and automatically summarize its content might be useful. 
[0003] Generally, a sequence ol video includes a series of scenes. Each scene, in turn, includes a series of adjoin- 
ing video "shots." A shot is a relatively homogeneous series of individual frames produced by a single camera focusing 
on an object or objects of interest belonging to the same scene Generally, automated video content analysis and 

is extraction involve "viewing" The video sequence, dividing The sequence into a series of shots, and selecting one or 
more "key frames" from each of the shots to represent the content of the shot. A summary of the video sequence results 
when the series of key frames is displayed. The summary of the video will best represent the video sequence if the 
frames which are most representative of the content of each shot are selected as key frames for inclusion in the sum- 
mary. Creation of a hierarchy of summaries, including a greater or lesser number of key frames from each shot, is also 

so desirable to satisfy the differing needs of user's of the video, 

[0004] The first step in the summarization process has been the division of the video into a series of shots of rela- 
tively homogeneous content. Video shot transitions can be characterized by anything from abrupt transitions occurring 
between two consecutive frames (cuts) to more gradual transitions, such as "fades," "dissolves," and "wipes,'" One 
technique for detecting the boundaries of a shot involves counting either the number of pixels or the number of prede- 

as fined areas of an image that change in value by more than a predefined threshold in a subsequent frame. When either 
the total number of pixels or areas satisfying this first criterion exceeds a second predefined threshold a shot boundary 
is declared. Statistical measures of the values of pixels in pre-specified areas of the frame have also been utilized for 
shot boundary defection Pixel difference techniques can be sensitive to camera and objeci motion. Statistical tech- 
niques tend to be relatively slow due to the complexity of computing the statistical formulas. 

so [0005] Histograms and histogram related statistics are the most common (mage representations used In shot 
boundary detection, Gray level histograms, color histograms, or histogram related statistics can be compared for suc- 
cessive frames. If the difference exceeds a predefined threshold, a shot boundary is detected. A second threshold test 
may also be included to detect the more gradual forms of shot transition. 

[0006] Selecting one or more key frames which best represent the relatively homogeneous frames of a shot has 

ss been more problematic than defining shot boundaries. Lagendijk et al. in a paper entitled VISUAL SEARCH IN A 
SMASH SYSTEM, Proceedings of the international Conference on Image Processing, pages 67 f -674, 1996, describe 
a process in which shot boundaries are determined by monitoring cumulative image histogram differences over time. 
The frames of each shot are temporally divided into groups reflecting the pre-speeif!ed number of key frames to be 
extracted from each shot. The frame at the middle of each group of frames is then selected as the key frame for that 

40 group. The selection of a key frame is arbitrary and may not represent the most "important' or "meaningful" frame of the 
group. Also, this process must be performed "off-line" with storage of the entire video for "r eview" and establishment of 
shot boundaries, followed by temporal segmentation of shots and then extraction of key frames. For key frame extrac- 
tion, the stored video must be loaded into a processing buffer so that the group of frames and associated key frames 
can be calculated. The size of a shot is limited by the size of the processing buffer. 

4$ [0007] In the copending application of Ratakonda, Serial No. 08/994,558, filed December 19, 1997, shot bounda- 
ries are determined by monitoring variations in the differences in image histograms overtime. Individual shots are fur- 
ther partitioned into segments which represent highly homogeneous groups of frames. The partitioning of shots into 
segments is achieved through an iterative optimization process. For each video segment, the frame differing most from 
the key frame of the prior segment is selected as the next key frame of the summary. A key frame is selected on The 

so basis of the frame's difference from the prior key frame and not on the basis of its representation of the other frames 
belonging to the segment. Like the technique proposed by lagendijk, this technique must be performed off-line and an 
entire video shot must be stored: for review, segment partitioning, and key frame selection. Additional memory is 
required to store the prior key frame for comparison. 

[00083 Zhang et al., U.S. Patent No. 5,635,982, disclose a method in which the difference between frames is mon- 
55 ifored and accumulated. When the accumulated difference exceeds a predefined threshold, a potential key frame is 
detected The potential key frame is designated as a key frame if, in addttiort, the difference between the potential key 
frame and the previous key frame exceeds a preset threshold. Without additional processing, the locations of key 
frames always coincide with the beginning a new shot. 
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[0009] Smith et ai., in a paper entitled VfDEO SKIMMING AND CHARACTERIZATION THROUGH THE COMBI- 
NATION OF IMAGE AND LANGUAGE UNDERSTANDING TECHNIQUES and Maufcfo et at, U.S. Patent No. 
5,664,227 disclose an elaborate key frame identification technique based on context rotes related to repetitiveness, 
degrees of motion, and audio and video content. The key frame sequences can be used to provide compact summaries 

s of video sequences but the method is complex and does not support creation of hierarchical video summaries. 

[0010] What is desired is a technique of automated video content analysis and key frame extraction which seiects 
key frames that are toe most representative frames of each shot or segment of the video sequence Simpie implemen- 
tation, conservation of computational resources, and the ability to accept a variety of inputs are desirable characteristics 
of such a technique, ft is desired that the technique provide for content analysis and key frame extraction both "on-line 

k) (m real time)," without the need to store the entire video sequence, and "off-line." Further, a technique of conveniently 
creating a hierarchy of summaries, each successively containing a smaller number of the most representative frames, 
is desired. 



[001 1] The present invention overcomes the aforementioned drawbacks of the prior art by providing a method and 
apparatus for digital video content analysis and extraction based on analysis of feature vectors corresponding to the 
frames of a video sequence, in the first embodiment of the invention, a method is provided for identifying a key video 
frame within a segment of video having frames of relatively homogeneous content including the steps of characterizing 
30 each video frame as a feature vector identifying a key feature vector that minimizes tie distortion of the group of feature 
vectors; and identifying the video frame corresponding Jo the key feature vector as the key video frame. Key frames 
selected by the method of this first embodiment of the present invention are the frames which are the most represent- 
ative of the content of the set of frames in each shot of a sequence, 

[0012] In the second embodiment a method is provided for determining the boundaries of a video segment within 

s$ a video sequence comprising toe steps of defining a threshold distortion; locating a first frame in the video segment; 
defining a first feature vector representative of toe first frame, including the first feature vector in a set of segment fea- 
ture vectors; defining a next feature vector representative of a subsequent video frame; including toe next feature vector 
in the set of segment feature vectors; calculating toe distortion of the set of segment feature vectors resulting from 
including the next feature vector in toe set; and comparing the distortion of toe set of segment feature vectors with toe 

si? threshold distortion. The steps of characterizing subsequent frames as feature vectors, adding feature vectors to toe 
set, calculating toe distortion, and comparing the distortion with the threshold is repeated untii ihe distortion Of the set 
of segment feature vectors has achieved some predefined relationship to the threshold distortion {hereby defining toe 
second boundary of the segment. Prior receipt and storage of toe entire video sequence are not required for toe seg- 
mentation process. Key frames can be identified simultaneously with segmentation of the video by applying the melh- 

ss ods of both the first or third embodiments. 

[0013] in the third embodiment of toe present invention a method is provided for creating summaries of video 
sequences including more than one key frame from each segment comprising toe steps of dividing toe video frames of 
the sequence into at least one video segment of relatively homogeneous content including at least one video frame; 
defining a feature vector representative of each of the video frames; ranking toe feature vectors representing the frames 

40 included in each video segment according to the relative distortion produced in toe set of feature vectors repressing 
the segment by each feature vector included in the set; and including in the summary of toe sequence, video frames 
represented by the feature vectors producing relative distortion of specified ranks. Utilizing the method of this third 
embodiment, a hierarchy of key frames can be identified from which hierarchical summaries of a video sequence can 
be created with each summary including a greater number of toe most representative frames from each segment, 

45 [0014] The foregoing and other objectives, features and advantages of the invention witl be more readily under- 
stood upon consideration of the following detailed description of the invention, taken in conjunction with toe accompa- 
nying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 



FIG 1 : Lstrafes the nethod ot ideni fy ng Key frames st h - p eseni rvention. 

FIG. 2A is a flow chart illustrating one technique for performing toe method of key frame identification of the first 



SUMMARY OF THE INVENTION 



so 



[0015] 



ss 



embodiment of toe invention. 



FIG. 2B 
FIG. 2C 
FIG. 2D 



is a continuation of toe flow chart of FfG. 2A. 
is a continuation of toe flow chart of FfG. 2B. 
is a continuation of toe flow chart of FfG. 2C. 
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FIG. 


7A 


FIG. 


7B 


75 FIG. 


7C 


FIG. 


7D 


FIG, 


7E 


FIG. 
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illustrates the meJhod of identifying segment boundaries and key frames of the second embodiment of the 
present invention. 

is a flow chart illustrating one technique for performtng the method of segment boundary and key frame 
identification of the second embodiment of tie invention, 
is a continuation of the fiow chart of FIG. 4A. 
is a continuation of the fiow chart of FIG. 4B. 
is a continuation of the fiow chart of FIG. 4C. 

illustrates the method of identifying a hierarchy of key frames of the third embodiment of the present inven- 
tion. 

is a schematic representation of an exemplary video sequence where each frame within a segment has 

been ranked according to the relative distortion of the segment produced by the frame. 

is a fiow chart illustrating one technique for performing the method of key frame ranking and compilation of 

hierarchical summaries of the third embodiment of the invention. 

is a continuation of the fiow chart of FIG. 7A. 

is a continuation of the flow chart of FIG. 7B. 

is a continuation of the fiow chart of FIG. 7C. 

is a continuation of the fiow chart of FIG. 70, 

is a schematic illustration of video sequence summarizing appliance. 



so DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 



[0016] Generally, a video sequence includes a series of scenes. Each scene, in torn, includes a series of adjoining 
video segments or "shots." A shot is a reiativeiy homogeneous series of consecutive individual frames produced by a 
single camera focusing on an object or objects of interest Generally, automated video content analysis and extraction 
25 involve "viewing" tie video sequence, dividing the sequence into a series of shots, and selecting of one or more "key 
frames" to represent the content of each of the shots. Displaying or providing a list of the key frames summarizes the 
video sequence. 

[001 7] The content of a frame of digital video can be represented in several ways A common representation is an 
image histogram obtained by calculating the frequency distribution of picture eiemeots or picture areas as a function of 

30 some property of the element or area. For example, a color histogram of an image is the f requency distribution Of picture 
elements or areas as a function of the color of the efement or area. The image histogram can, in turn, be represented 
as a vector signal including a combination of separate components each carrying information about a different property 
oi the histogram, otherwise known as a "feature vector." Other image representations can aiso be captured by feature 
vectors. Feature vectors, for example, can be used to represent the average intensity of an image; an ordered set of 

ss image samples; an ordered, fixed set or subset of linear or nonlinear combinations of samples from an image; a lumi- 
nance, chrominance, or combined luminance and chrominance sample value histogram; or an ordered set of statistical 
or deterministic, or a combination of statistical and deterministic, parameter values from an image or an image histo- 
gram. 

[0018] In the present invention, the frames of a video sequence are preferably represented as multidimensional fea- 
40 ture vectors. A vector rank f filer or a more specialized version of the vector rank fitter, the vector median filter, is used to 
determine the relative cumulative distances from each feature vector in a set to the other feature vectors in the set. The 
cumulative distance from a feature vector to all other feature vectors in toe set measures the "distortion" of the set of 
vectors as evaluated from the vector under investigation. The distortion indicates the homogeneity oi the content of a 
frame of video, characterized by the corresponding feature vector, with the content of all other frames of video in the set 
45 The output of the vector rank filter is used to resoive the homogeneity of the video's content. The vector rank filter per- 
mits identification of the video frame that is most representative of the content of the frames of a set. It aiso facilitates 
determining when the content of a next frame in a series differs substantially from the preceding frames in the series or 
ranking the frames of a set according to the relative homogeneity of content of the frames or distortion of the set by each 
frame. Finally, the vector rank filter enables an on-fine implementation of a video summarizes meaning that represent- 
so ative video frames are generated as more video frames or video f ietds are input into the system. An on-line implemen- 
tation offers advantages over an off-line implementation where typically aii video frames must be acquired and stored 
before they can be processed. The biggest advantage may be in the fact that a "smart" receiver capable of monitoring 
and evaluating its own resources (computing resources, memory resources, communication bandwidth resources,...} 
may choose to tune the parameters of the on-line summarizing aigorithm in response to instantaneous variations of the 
55 resources, in effect, an on-line implementation allows a receiver to make smart trade-offs in time thereby producing 
video summarization results which may be viewed as optimal results with respect to video content and resource avail- 
ability at the time of its processing. 

[0019] The vector rank filter can be used to rank the vectors of a set of vectors in terms of the cumulative distance 
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from a vector to the other vectors of the set or the relative distortion of the set caused by each vector. Given a set of P 
vectors &h,, where 1 < h < P, the output of the vector rank filter will be a vector, which has the rrth smallest distortion 
value among the P distortion values. If 0, m! i? , where 1 s rn < P and where q identifies the distance measure, is the dis- 
tortion of rank m (ranked in increasing values of m) then Dpf corresponds to the minimum distortion, that is. distortion 
s of rankone. Likewise, 0^ corresponds to the second smallest distortion. The distortion D (P f corresponds to the max- 
imum distortion, and D (h p corresponds to the Mh ranted smallest distortion. The vector characterized by the least 
cumulative distance is the vector characterized by the second smallest cumulative distance is x s , and so forth. The 
vector characterized the greatest cumulative distance is x P . For the vector x h the distortion equals: 

Hi hP 

ht 

where w is a weighting factor associated with me joint use of the vector x, and the vector and o specifies the dis- 
w tance or distortion measure. The choice of q may be determined by application considerations. For exampie, summing 
the absolute differences between the vectors (o»1) is computationally less intensive than summing the squares of the 
differences (q=2). In other applications, q might be selected to optimize the distortion with respect to certain input vec- 
tor statistics. Likewise the weight might be selected to represent the relative importance of input feature vectors 
or to implement particular rank-ordered statistics. 
so [0020] In fie case of ties in rank, the same rank may be allocated to all vectors sharing the same distortion value. 
In this case, the next rank used wouid be adjusted to reflect the fact that more than one vector occupies the "tied" rank. 
For example, if the distortions Dp, Dp, D m a - are equal and represent the second ranked distortion, the respective vec- 
tors x ;, x ,. and x m can all be assigned the tank of three (the average of the rank values: two, three and four). The vector 
producing the next smallest distortion would be assigned the rank five to reflect the fact that three vectors are sharing 
S5 ranks two. three and four. An alternative approach to dealing with ties wouid be to assign a fractional rank to vectors 
sharing the same distortion. For example, if two vectors produce equal distortion, ranked two, the vectors might be 
assigned the rank 5/2 [(2+3)J2 = 5/2] to indicate the tie. 

[0021 J A specialized vector rank filter, the vector median fitter, is particularly useful in identifying the most represent- 
ative frame of a video segment or shot. The vector rank filter can also be used in determining the boundaries of a shot 
si? or segment The output of the vector median fitter identifies the vector producing the minimum distortion among all vec- 
tors irt a set of vectors. In other words, the vector median filter identifies the vector of rank one as identified by the vector 
rank filter. Given a set of P vectors Xj, where i zjsP, the output of the vector median filter, x k , is such that the index 
k satisf ies: 



k~ argmin {or 

40 

where the distortion or cumulative distance Df , for q = 1 , 2, . . . , ■» is defined as 

45 D ; ^£ vutxrxtl* 

hi 



and where wjj denotes a weighting factor applied when vector Xj and vector x, are used jointly and q specifies the dis- 
tortion or distance measure. 

|0022] For example, if the vectors x are R-dimensionai vectors with components x ;> where i < / < R t and if the 
weight value, w u and the distortion or distance measure q are equal to f the minimum distortion is: 
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For q = 2, the distortion is: 

j-P t-R 

For q » «, the distortion is: 



[0023] Hie outout of the vector median filter is the vector which is globally the "closest" relative to the distortion 
is measure q, to all The other vectors within the set P vectors. The output vector, can l>e considered to be the most 
representative vector of al! the P input vectors in the set since it is associated with the least distortion within the vector 
set. The video frame corresponding to this feature vector can be considered to be the most representative of the set of 
video frames associated with the set of input feature vectors. 

10024] Referring to FIG. 1 , in a first embodiment of the present invention video segments 1 0 are input to a vector 

so median fitter 12. The vector median filer 12 identifies the key frames 14 that are the most representative of the video 
segment 10 where the boundari es of the segment 1 0 has been established by the methods of the pr esent invention or 
otherwise. The key frame 1 4 from each of a plurality of segments may be used as a summary of the video sequence 
comprising that plurality of segments. For example, assume a jth segment of video with two identified time boundaries, 
times t(j} and t{j+1), where 1 = 1,2 N. The jth segment contains M frames of video which are denoted here as 

ss F(ffj})< F{t(j)+1 ) F(1(8+M-1}, The video frames can be defined as a set of R-dimensiona! feature vectors denoted by 

jfiftgj). M(j)+1 ) MD+M-1) where the vector Mtfj)+.<) is associated with the frame F(f(j)+>) and where 0 <, i < M-1, 

The next video segment {the fj+l)ih segment) starts at time t(j+1) such that tfj+lM{j} +M wherej<N 
10025] The application of the vector median filter 1 2 to tie vectors i)(t(j)) , £tf(j>+ 1 ) /#{j)+M-1 ) permits identifi- 
cation of the vector producing the least distortion among the set of vectors corresponding to the video segment 10. This 

30 vector will correspond to the frame, belonging to the set of frames F(t(j)) ( F(t(j}+1}, , . . F(t(j)+M-1). which is most repre- 
sentative of the content of the frames in the segment 10. A key frame 14 can be determined in the same fashion tor 
each segment in the video sequence. The resulting series of key frames 14 constitutes a summary of the video 
sequence where each key frame is the most representative frame oi each segment While the number of frames M may 
vary from segment to segment the overall number of key frames identif ied by this embodiment can be controlled if the 

ss input frame rate is known. 

[0026] FIGS. 2A Through 20 illustrate a flow chart for one technique of performing the method of key frame identi- 
fication of the first embodiment of the present invention where the feature vector utilized is based on an image histo- 
gram. The variables used in the flow chart include: j - toe number of the first frame of the segment; b - the segment 
number. accjdif - the distortion or cumulative distance between the current frame and the prior frames under consider- 

40 ation; minjdif - the .current minimum cumulative distortion; n - the current frame; m - the number of frames being con- 
sidered (from the beginning of the segment to the current frame (n)); vm(b) - the vector median for the bth segment; 
P(b) - the number of frames in the bth segment, L - the number of segments in the video sequence and HUGE is a large 
number that cannot be obtained by any cumulative distance. 
[0027] In the method for identifying a key frame of this embodiment; 

45 [0028] First, a system for performing the method sets j and b to 1 (step SI). Then, the system computes hfj), histo- 
gram of image j, and sets cumulative distortion ace_drf{j) to 0 (step SZ). Next, from bth video segment starting at jth 
video frame the system sets min_dif to HUGS (step S3). Setting n = j + 1 (step S4}, the system acquires image n, cal- 
culates histogram h(n) and sets cumulative distortion aec_dif(n) to 0 (step S5). Setting m « j (step S6), the system 
computes dif, the sum of the absolute bin (histogram entries) value differences between histogram h(m) and histogram 

so h(n) (step S7). 

£0029] The system updates cumulative distortion values accjdif(n) + and acc_dif(rn) +■ for histograms n and m to dif 
(step $8). if acc.dif(m) < min dif (step $9), the system sets minjdif = acc„dif[m) and vm{b) = m (step S10). If 
m = n - 1 (step $11). whenacc oifn) <min dif (step S13), the system sets mtn_dif = acc_dff(n) and vrrtfb) = n (step 
SI 4). If not, setting m = m+ 1 (step Si 2), the process returns to step S7. Next, if n = P(b) (segment iength Pfp) 
assumed known) (stop S15), when q = L (L is number of segments) (step S17), the system starts displaying 
sequences of representative frames (step Si 9) and sets b ~ 1 (step S20) andp- 1 (step $21). to step S15 if n is not 
equal to P(b), the system sets n = n + 1 (step S16) and returns to step S5. instep S17 if q is not equal to U the system 
sets b = b + 1 and J = rt + 1 (step S18) and returns to step S2. 
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[0030] if p = vmfp} (step S22), the system display video frame (step 623). Next, if P = P(b) (step S24) and b = L 
{step 826), the system stops the process of the method, if P is not equal to P{b}, the system sets p = p + 1 (step 525} 
and returns to step S2S. In step S26, if b is not equal to I, the system sets b - b + 1 (step S2?} and retur ns to step S21 , 
[0031] Referring to FIG, 3, in the second embodiment of the present invention a video processor 40 receives a 

s video sequence 42. The video processor 40 produces an automatic selection of key frames 44, similar to FiG. 1 , such 
that the cumulative frame difference between the key frame and the other frames within the segment does exceed a 
given level of distortion. This ensures that the overall distortion within a segment remains bounded relative to the most 
representative video frame. This criterion also provides a technique to determine the video segment boundaries 46 "on- 
line" simultaneously with the key frames 44, 

w [0032] in the second embodiment of the present invention the boundaries 46 of the video segments do not need to 
be pre-specified. To determine the boundaries 46 of a video segment, the boundaries 4S are designated as a variable 
which is calculated "on-line" as the each successive frame of video is received and "viewed." While video comprises 
consecutive frames, a sampling technique can be used to select frames which are subsequent but not necessarily con- 
secutive. When the distortion or cumulative distance resulting from the addition of a last frame to the set exceeds some 

is predefined threshold, tor the first time, the boundary ending the segment is detected. For example, a consecutive set of 
video frames, F{1(j}), F{t{i>1). ... F(tQ+1)-1), F(t{j+1)}, corresponding to a time period t<j),t(j)+1 - tfl+1)-1, t(J+1), is to 
be segmented, A first segment boundary is assumed to have been established by this method at time tfj). The segment 
starting at time tfj} is segment j. Since video comprises a set of consecutive images, the first image of the first segment 
is assumed to correspond to the first frame of the sequence. The first segment boundary of any subsequent segment 

so can be determined by successively locating the second boundary each segment which follows the first segment. As 
each successive frame is received, a feafure vector is computed and added to the feature vector set beginning with 
n(t(j)}eorrespofiding to frame F{tfj)). The minimum distortion of the set of vectors is evaluated with the vector median 
fitter. If the cumulative minimum distance, D^y, calculated by the vector median filter when applied to frames F(t(j)), .... 
F(1(j+1}), exceeds some predefined threshold t for the first time when frame F{t(j>l)) is added to the set of vectors, the 

25 non-inclusive, ending segment boundary for the segment] is declared to be at time f(j+1), in other words, the time tfj+1) 
represents the beginning of the next video segment (segment j+1) if the minimum cumulative distance is less than the 
threshold T{D q .,.it(>1} - 1} < T] for the frame immediately preceding this frame (frame F(t(j-i t}-1)), but the minimum 
cumulative distortion exceeds the threshold T {O 9 fy (t(j+1)) -1ST) when frame F(tQ+1)) is added to the set. 
[0033] As an alternative criterion, a shot boundary may be declared when the cumulative distance associated with 

30 the most recent frame exceeds some function of the minimum distance in the set of vectors, such as the product of the 
minimum distance and some factor (k) which is greater than one. in other words, a new segment is detected when: 

O* mv ,j < k D* f?! (t+a+1)-l) and D q <k D q (T; (t+{j+1)) where te>1. 

as [0034] The threshold T or the factor k may be selected as a function of the noise power in the i mage or as a function 
of a target compression ratio to be achieved by the video summary or some combination of noise power and target com- 
pression ratio. 

[0035] When segment boundaries are detected, the key frame for the segment is the cutout of the vector median 
filter applied to the frames of the segment as described in the first embodiment As a result summaries can be produced 
40 "on-line" where the shot boundaries are detected and a key frame within each shot is simultaneously selected as the 
frames of video are "viewed" without the requirement to store the entire video sequence for review and processing. Fur- 
ther, a shot containing a single frame can be identified by the method. 

[0036] Referring to FiGS. 4A Through 40, a flow chart illustrates a technique of implementing the method of the 
second embodiment of the present invention where the feature vectors are image histograms and a shot boundary is 

45 declared when the maximum cumulative distance in a candidate video segment exceeds the minimum distortion calcu- 
lated by the vector median filter multiplied by distance multiplier k. The variables are the same as those described above 
for FIG, 2. An additional variable, previous_vm{b). is used to denote the index of the previously selected vector median 
for segment b. This variable records the position of the key frame once a segment boundary has been defected. In this 
Java code, the parameter is time-varying and decreasing in time such that a segment boundary is always generated by 

so a certain, pre-defined number of frames. The Java code also includes a shot boundary detection which detects homo- 
geneous portions of the input video sequence. Here is a shot that consists of several segments each segment being 
represented by a keyframe The Java code also includes an additional mechanism monitoring how many times the 
same vector median value s consecutive sel cfed T 1 mecharm > n e put in place to ensure that keyframes result 
from selecting toe same vector median for some duration in time. Table A provides a sample source code written in the 

ss Java programming language for impiementing this technique of the second embodiment. 
[0037] in the method of this embodiment; 

[0038] Firs- a =><> 3 ban ♦>~e system computes h{j), his- 

togram of image j (step S32), and sets cumulative distortion accjdif{j) and previous jmi(b) to zero (stop S33). Next, 
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from bth video segment starting at jth video frame the system sets min. dif = HUGE (step $34} . Selling n = j + 1 (step 
535), if it is end of video sequence (step S36), the system stores P(b) and sets L = & (step S37). If not, the system 
acquires image n, calculates histogram tt(n) and sets cumulative distortion acc.dif(n) to 0 (step S38). Selling m = j 
(step S39), the system computes dif. the sum of the absolute bin {histogram entries) value differences between histo- 
gram h(m) and histogram h(n) (step $40). 

[0039] The system updates cumulative distortion values acc . dif (n) + and acc . dif(ro) + for histograms r? and m to dif 
(step 341), if acc_dif(m) < min_dif (step S42), the system sets minjdif = acc_dif(m) and vm(b) = m (step S43). If 
m = n-1 (step S44), when acc_dH(n) < minjtif (step S46), the system sets rrsr»_dif = acc_dif{n) and vm(b) = n 
(step S47), In step S44, if m is not equa! to n - 1 , setting m = m + 1 (step $45), the process returns to step S40, Next, 
if acc_drf(ri) > {k'mtrtjdrf) (step S48), the system sets end of segment to n-1, stores P{&), P(b) = n-j and 
vm(b) = previous vm(b) (step S5Q). Then settling b = b + 1 and j = n (step S51), the process returns to step $33. If 
not {step $48), setting n = n +1 and previous„vm(b) = vm(b) (step S49), the process returns to step 836. 
[0040] Next the system starts displaying sequences of representative frames (step $52). Setting b = 1 (step S53) 
andp-1 (step S54), Hp- vm(b) (step S55). the system displays video frame (step S56). Next, »P- P{b) (step S57) 
and b=L (step S59), the system stops process of the method, if P is ret equal to P(b), the system sets; p = p+ 1 (step 
S58) and returns to step S55. In step S59. if b is not equal to t. the system sets b = b + 1 {step S60) and to step S54. 
[0041] Referring to FIG. 5 ; in a third embodiment of the present invention a hierarchical process provides a video 
skimming methodology based on the frame ranking results produced by a vector rank filter 70. The process identifies a 
hierarchy of frames 74 within a video segment 72 which is the input to the veciof rank filter 70. Hierarchical summaries 
can be constructed by skimming different numbers of key frames from each video segment on the basis of the key 
frame's rank in the hierarchy of key frames 74. The method can easily produce a hierarchical series of summaries with 
each successive summary containing a greater number of the most representative frames from each shot. The result 
is a video content driven temporai sub-sampling mechanism based on the rank values produced by the vector rank fit- 
ter, FIG, 8 iiiustrates an exemplary video sequence of eleven frames 60 which has been segmented into four segments 
82 by the methods of the present invention or otherwise. The vector rank filter is used to generate the rank 84 of the 
frames 60 within each segment 62 according to the distortion or cumulative distance pioduced by the frame 60 in the 
set of the frames of the segment 62. To skim the video, frames 60 can be selected for inclusion in a summary on the 
basis of the relative cumulative distance produced by the frame in the video segment 62 of which the frame SO is a 
member. Skimming the video at level one, would include the frames of rank one 66 in the coarsest summary. A level 
two summary could include frames ranked one 66 and fwo 68, At each ieve! of the hierarchy, additional frames 60 would 
be assigned to the summary unfit ail of the frames 60 of the sequence are included in the summary of maximum rank. 
White each increase in rank adds frames 60 to the summary which are less representative of the content of the segment 
62, each summary comprises those frames 60 which are the most representative of the sequence. FK3S. 7A through 
?E illustrate a flow chart of a technique of implementing this third embodiment of the present invention. The variables 
used in FIGS. 7A through 7E are those identified above for FIG. 4. 
[0042] In the method of this embodiment; 

[0043] First, a system for performing the method sets j and b to 1 , and pmax to 0 (step S7 1 ). Then the system com- 
putes h(i) .histogram of image j (step S72) and sets cumulative distortion acc_dif(j) to zero and previousjvm(b) = j (stop 
S73}. Next, from bth video segment starting at jth video frame the system sets rninjdrf = HUGE (step S74). Selling 
n « j +1 (step S75), if it is end of video sequence (step S76). the system stores P(b) and sets L = b (step S77). if not 
(step S76), the system acquires image n, calculates histogram h{n) and sets cumulative c . f ^io 0 (step 

S78). Selling m = j (step S79), the system computes dif, the sum of the absolute bin (histogram entries} value differ- 
ences between histogram h(m) and histogram h(n) (step S80). 

[0044] The system updates cumulative distortion values acc_dif (n) + and acedff (m) + for histograms n and m to dif 
(step S81) If acc_dif{m) < mm. dif (stop S82). the system sets min dif » acc.dif(nrc) and vm(b) » m (step S83). Next, 
if m ~ n - 1 {step $84), when acc_dif(n) < min_dif (step S86), the system sets minjtif ~ acc_dif{n} and vm(b) = n 
(step S87). If not (step S84), sailing m = m + 1 (step S8S), the process returns to step S80. Next, in step S88, if 
acc_dif{n) > k'min_dif , the system sets end of segment to n+ 1, stores P(b) and sets P{b) = n-j and 
vm{b) = previous_vrn{b) (step S90). Then, the system ranks acc„dif(j), ... acc_dif(n-1) values in increasing order, 
stores rank value associated with every video frame and sets length of segment P(b) = n-j (step S91). In step $92, if 
largest rank value > pmax .the system sets pmax ■ largest rank value (step $93). Next, setting b = b + 1 and j = n 
(step S94) , the process returns to step S73. in step S88. if not, the system sets n = n + 1 and previou$„vm{b) = vm(b} 
(step $89} and the process returns to step $76, 

[00453 Following the step $77, the system sets r * 1 (step S95) and starts displaying sequences of frames having 
rank greater or equal to r (step S96). Then, selling b * 1 {step $97) and p ~ 1 {step S98), if rank of image p in segment 
number b is not greater or equal to r (step S99), the system displays video frame (step S100). Next, If P = P(b) (step 
$101) , b a ■ L (step $103) and r * pmax (step $105), the system stops the process of the method. In step $101 , if P 
is not equal to P(b), the system sets p = p + 1 (step $102) and returns to step S99. in step S103, if b is not equa! to I, 
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the system sets b = b + 1 (step S104) and returns to step S98. in step S105, r is not equal to pmax, the system sets 
r = r + 1 (step S1 06) and returns to step S96. 

[0046] A video skimming system cm be constructed to display a skimmed video summary of frames of any speci- 
fied range of ranks. This effectively provides temporal compression of the original video sequence. For this purpose, the 
s distortion rank of each of the video frames must be stored so the display system can identify the video frames to be 
included in the designated summary, index and rank information can be stored in a database with the video sequence 
to provide temporal, hierarchical descriptions of a video sequence. 

[0047] A system displaying hierarchical summaries is an alternative to showing video frames following a linear low 
temporal frequency" to "high temporal frequency" decomposition, in each segment, the rank 1 output of the vector rank 

w filter can be viewed as the "average" key frame since it is the most representative frame of all frames of the segment. 
Conversely, the video frame of highest tank can be viewed as the frame displaying the most content detail since it is the 
least representative frame of the segment. The system is nonlinear with the benefit that temporal filtering is not 
required. Table B illustrates a computer program in the Java source language for generating hierarchies! video summa- 
ries. In addition to performing automatic segment boundary detection, the program applies the vector median process 

is of embodiment 1 to the keyframes belonging to fhe same shot, in effect the Java source code provided in Table B pro- 
vides a way to generate a second, coarser summary of the video sequence, where only one keyframe is generated per 
video shot. Like In the source code of Table A, fhe fine and coarse video summaries are calculated on-line, 
[0040] In connection with identification of key frames utilizing the techniques of this invention, hierarchical summa- 
ries of key frames can be generated utilizing other methods, including a Linde-Suzo-Gray clustering algorithm (also 

so known as a K-means algorithm) as described in the paper AN ALGORITHM FOR VECTOR QUANTIZER DESIGN, 
IEEE Transactions on Communications. January 19S0; or a method such as that described in the co-pending applica- 
tion of Ratakonda, Serial No. 08/994,558, filed December 19, 1997, which relies on the rate of change of images in con- 
secutive frames to constrain the clustering process. 

[0049] F!G. 8 illustrates a summarizing appliance 80 for constructing summaries of video sequences according to 

S5 the methods of the three embodiments of the present invention. A video sequence comprising a plurality of video fields 
or frames 82 is input to a receiver 84 of the appliance. The frames 82 or some representations ther eof are fransfened 
from the receiver 84 to a vector converter 86 which characterizes each frame as a feature vector. As successive feature 
vectors representing successive frames 82 of the video sequence are defined, the feature vectors are accumulated in 
an accumulator 88 i n an expanding set of vectors. The distortions or cumulative distance measures of the set of vectors 

si? are calculated in the vector filter SO as new feature vectors are added to the set The vector filter 90 ranks the feature 
vectors according to the distortion that each feature vector produces in the set of feature vectors in the accumulator 88, 
If video segments have not been identified in the input frames 82 the cumulative distance associated with the most 
recent vector added to the set is calculated in toe vector fitter 90 and compared to some threshold value in a comparator 
92, When the cumulative distance associated with the feature vector most recently added to the set has obtained some 

55 predefined relationship to the threshold value, a segment boundary is declared. When a segment boundary is declared 
in the comparator 92 or if segment boundaries are defined in the input frames 82, the output ot the vector filter 90 is 
used in the skimmer 94 to identify and record the frames that are most representative of the content of the video 
sequence if the vector fitter 90 is based on the more specialized vector median fitter, toe skimmer 94 can identify toe 
most representative frame as a key frame 96 in each video segment. A vector filter 90 based on the more general vector 

40 rankfirter may be used to rank frames according to the relative homogeneity of their content or their representativeness 
of the content of a video segment The video skimmer 94 can identify frames according to a hierarchy of representative- 
ness based on ranking the feature vectors according to relative cumulative distance or according to a clustering tech- 
nique. With this technique, multiple key frames 96 can be selected from each segment and hierarchical summaries of 
the plurality of frames 82 can be created. 

45 [00SO] The terms and expressions that have been employed in fhe foregoing specification are used as terms of 
description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding equiv- 
alents of toe features shown and described or portions thereof, it being recognized that the scope of the invention is 
defined and limited only by the claims that follow. 

so 
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TABLE A 

. ■ „ 

•/ 

/** Java applet implementing on-line video summarization **/ 
/** using the vector median filter **/ 

import java.awt.*; 
import Java.appfet*; 
impoitjava.net.*; 
import java.awt. image.*; 

public class vmjapptet extends Applet implements Runnabie 



private final int OX * 260; 
private final int OY « 0; 
private final int REFRESH * 100; 
private final int LOOP_WAIT * 4939; 
private final int down_off= 145; 
private final int teftjsff - 0-250; 
private final int HISTlEN ~ 256; 
private final int MAX_KEYFRAMES * 5; 

private final double alpha « 3.5; 
private final int minShotlength * tO; 
private final int seg_maxlen - 45; 
private final int segjhresh * 17; 
private finai int min_run!en * 7; 

private static int xPos - 40; 
private static int yPos - 40; 
private static int pbestjnit; 

private String 0 nnsgs; 

private Image 0 frogs; 

private Image 0 KeyFrameJmgs; 

private int img_couni « 0; *" 

Graphics currg ~ null; 

ImageObserver b; 

private static int paint Jlag; 

private static String str; 

private int width; 

private int height; 

static int loaded_count * 0; 
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static boolean imgs^alljoaded * false; 

private int QQ hist; 

private int Q store Jowseg; 

private int Q storejiiseg; 

private int 0 accjiif; 

private int countjsags * 0; 

private boolean Q ffagLsftot 

private MemorylmageSource Q theKeyFrames; 

private int ScanBase; 

private image theLightGraytmage; 



private double mu - 0.0; 
private double sigma = 0.0; 
private int counter = 0; 
private int eount_vm ~ 0; 
private int pmvjziuntjsm - 0; 
private int seg Jen * 0; 
private int vmJndex-0; 
private int prev_ymjndex~0; 
private int ScanKeyFrames » 0; 
private int prev = 0; 
private int base - 0; 

Thread t - null; 

public void tnit{X 

String numjmages; 



numjmages = getParameterf fmageCounf ); 
irng_count = inieger.vaIueOf(nurnJmages).intVaiue0; 
msgs - new SWngfimg_count]; 

fort int k - 0; k < tmg count; k++) 

{ 

msgs[k] - getParameterOnTage"+String.valueOf(k)}; 

} 

// Prepare screen display environment 
Rectangle ctn « getParent{).boundsO; 
setBackground(Coiorjjght6my); 
reske(ctn.width, ctahetght}; 
cung = getGraprucsQ; 
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smgs ~ new lmagepmg_couMj; 

KeyFrameJrngs » new image{MAXJ<EYFRA!v1ES]; 

hist * new intfimg^coun^Q; 
store Jowseg - new inlpmg_count]; 
storejttseg - new intf img_countj; 
acc_dif « new intf img j:ount J; 
f!ag_shot - new boolean grngjxjunt}; 



// Provision memory for histograms 
1or( int k - 0; k < img_count; k++) 

{ 

histfkj ~ new intpSTLEN}; 

} 

theKeyFrames - new MemorylmageSourcepmg_countj; 
theLightGraylmage = MakeUghtGrayimage{90,60); 

// HISTLEN entries in histogram 
// 256 is max sample value difference 
pbestjnit = imgjsount " HISTtEN * 256; 

> 



// Utilities for displaying video sequence and 
// keyframes 

public void paint(Graphics g, int xcoord, int ycoord, Image frame, int fwfciih, 
int fheight, String caption) 
{ 



ffX g == null ) 
{ 

g = getGraphicsQ; 

> 

g.setFont{ new FontfAriar, FontBOLD, 11} ); 

to - null; 



g.drawimagei'frsme.xcoord.ycoord.Color.biack.io); 
g.setCoior{Color.yeilow); 
g.fjllRect(xcoord,ycoord+fhe:ght,fwidth,15); 
g.setCotor{Cotor,blue); 
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g.drawStringCcaption^cooRi^cQord+^eigfit+tO}; 



public void paint{Graprsics g, tnt xcocrd, int yccord, Image Q keyframe, frit 
kfwsdth, int kfhesght, int entryBase, irri KeyFrameCount, int Q towBound, int 0 
highBound, boolean 0 shot detect ) 

{ 

int index; 
int shift*; 

if(g== nuii) 
{ 

g - getGraphicsQ; 

} 

g,setFont( new FontCAriai", Font.BOLD, 1 1} ); 
to « nuJi; 



shiftx^O; 
index - 0 ; 

fori int kf « entryBase; kf < entryBase+MAX KEYFRAMES; kf++) 
{ 



g.draw!mage(keyfr3me[index++},xcx^ 

if(kf < KeyFrameCount) 
{ 

str =: " #" + String.vafueOf( towBoundpcfJ ); 
str+* w ofr; 

str +- String.valueOf(bwBoundfkfj); 
str+- V; 

str+= String.va{ueOf(highBound[kfj); 
stf +* "J"; 

if( shot_detect[kfj -= true ) 
{ 

g.setColor(CoIor.green); 

} 

else 
{ 

g.setColor(Coior.yeiIow}; 
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} 

gJIIRectCxcoord^hlfe.ycoord+kfheightkfwidth.lS); 
g.setCoior(Color.red); 

g .drawString(str,xcoord+5hifbc l ycoord-Hcfheight+1 0); 

else 
{ 

g,se£CoSor{Cc!crJ!ghtGray}; 
gJiiRe^xcoord+shifbc^ccord+kfheigh^ktwidth.lS); 



shfflx+={kfwidth+10}; 

> 

} 

public void paint(Graphscs g) 
{ 

g.setColorCCofcr.red); 

g.seifont( new Fon(("Ariar, FontBOLD, 11)}; 

switch( paintjlag ) 

{ 

caseO: 

g.drawSlring(nu!i ) 0,0); 
break; 

case 1: 

g.drawString(str,xPos,yPos ); 
break; 

default 

} 

} 



public void startO 

* ff(t««nuH) 
{ 

t-newThread{irtis); 
tstartO; 

} 

> 

public void nmQ 
{ 



14 



EP1043664 A2 



int scanjmgs; 

boolean newjceyframejfetected; 

image 0 fsfjmagss; 

Image | refj<eyframes; 

int refjtfidtn; 

Intrefjieight; 

int refJceyframeCount; 

int 0 ref^segmentLowBoundanes; 

int 0 mLsegmentHighBoundaries; 

boolean 0 ref_shotBoundanesindicator 

int refjjispiayedKeyframelndexSase; 

int refjeqien; 

// Get image references 

for{ int k «0; k < img count; k++) 
{ 

try 

( 

imgs(k] * TcoikfcgetDefauitToo!kitQ.getfmage( new URL( 
getDocumentBaseO. msgs[k] )); 

str = "Getting Reference to image "; 
sir String.vaiueOf(k); 
t.sfeep( 10 }; 
repainti); 

} 

catch( Exception e } 

{ 

} 



// Load images 
paint Jiag « 0; 
repaintO; 

fbn( int k ~ 0; k < img_ccunt; k++) 
{ 

try 
{ 

so = this; 

cung.drawimage(imgs[kjr1 000,-1 000.Cotor.black.io); 
t.sleep{100); 

} 

cateh{ Exception e ) 
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{ 
} 

) 

while( iimgs all loaded } 
{ 

fry 
{ 

str « "Loading images... "; 
paintjlag « 1; 
xPos~= 40; 
yPos * 40; 
repaint{); 

} 

catch( Exception e) 

{ 

> 



// Start video summarization 

while( true ) 
{ 

try 
{ 

str * "7.4925 Hz Progressive 180x120 Video"; 

pamijfeg*!; 

xPos = OX + 220; 

yPos = OY + 8C; 

repasnt(); 

} 

catch{ Exception e 5 

{ 

} 



refjseqien - getSequenceLengthO; 

// scanjmgs represents the number of video frames 
// in the set 

for{ scanjmgs = 0; scan tmgs < ref_seq!en; scanjmgs-M-) 
{ 

// summarize and see if segment boundary 
// has been detected 
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new„keyftame_detecfed « summarizef scanjmgs ); 

// Get references to arrays to prepare 

If display of video sequence and keyframes 

refjrnages - getlmagesO; 
ref keyframes = getKeyframes{); 
refjwidth-getWldthO; 
re(_height - getHeightQ; 
refJceyframeCount - gefKeyframeCountQ; 

ref_segmentLowBoundaries * 
getSegmentlowScuridariesO; 

ref^segmentHighBoundaries = 
getSegmentHkjhBoundariesOr 

ref^shotBoundariesindicator = 
getShotBoundaries indicatorQ; 

ref_dispiayedKeyframeindexBase - 
getDispiayedKeyframeindexBaseO; 

// display 

try 

{ 

paint(currg, OX, OY, refjmages[scanjmgs], 
refjvidth, refjieight, msgs[scanjmgs]); 

if( new keyframe detected == true ) 
{ 

paint(currg, OX+ieftoff, OY+down_off t 
ref_keyframes, ref„wdth»1, ref_height»1, ref_disp!ayedKeyframeindexBase, 
reAeyframeCouni, ref_segmentLowBoundaries, ref_segmentHig h Boundaries, 
refjshaiBoundariesfndicator ); 

} 

else 
{ 

tsteep( REFRESH); 

} 

> 

catch{ Exception e) 

{ 

} 

) 
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try 

i 

tsieep(LOOP WAIT); 

} 

catch{ Exception e} 

{ 

) 

> 

} 

public boolean summarize( intgtobj ) 
( 

int diftot; 
In! dtfvaf; 

double currentMean; 
double threshold; 
int pbest; 
int fracjiurn; 
int fracjfeno; 
boolean keyframe Jlag; 

keyframe flag = false; 
wkitb - Of 

whs!e{ (width * imgsfetobJ.getVV]dth(nuiJ}) ~ 0) 

{ 

} 

height ■ 0; 

while( (height = tmgs(g!cbJ].getHeight(nuii)) -~ 0) 

{ 

} 

// calculate histogram of new video frame 
GetHist{0 < 0,width,height f giobJ); 

if(giobj«0) 
i 

It initialize cumulative distance registers 
for( int k * 0; k < img count; k++) 
{ 

acc_dif[k]*Q; 
flag_shotfJ<] = false; 

} 
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mu * 0,0; 
sigma * 0.0; 
counter - 1; 
prev - Q; 

prevjcountjmi - 0; 
counTvm «1 0; 
courrtjsegs - 0; 
seg Jert = 0; 
vm Jndex » 0; 
prev^vmjndex = 0; 

ScanKeyFrames * 0; 
ScanBase - 0; 
base * 0; 

for( ini smgtndex - 0; imgindex < MAX_KEYFRAMES; 

fmgtndex++) 

Keyframe Jmgspmgindex) * theUghtGraylmage; 

} 

fiagjsfiot(PJ - true; 
return ( teyframejiag }; 

} 

// Shot boundary detection 
diftot = 0; 

for( int m « 0; m < HISTLEN; m++) 

* difval - histfetabjjmj - histfelobj-llmj; 
if{ difval > 0 } 
{ 

diftot difval; 

> 

else 
{ 

diftot ~ difval; 

} 

> 

mu +- diftot; 

sigma += (diftot * diftot); 

currentMean - mu / (double) counter; 

threshold - currentMean; 

if{ counter > 1 ) 

{ 



19 



EP 1 043 664 A2 



threshold +~ (aipha * Math.sqrt{ (sigma - 
((double)counter*currentMean*currentMean)} / (double)(counter-1) }}; 
} 

// This test determines whether a shot boundary is present 
"rf{ {diftot > threshold) (glob J-prev > minShotlength) ) 
{ 

// Shot boundary has been detected. 
If A keyframe wBI be generated 
prev ■ glob J; 
mu » 0.0; 
sigma « 0.0; 
counter * 0; 

fiag_shot(count_segs+1} = true; 



GetDecim^O.width.height.base.count^segs); 
storeJowseg[count_segs } - base; 
storeJiiseg[count_segs-f-+] « globj-1; 
base a globj; 
seg Jen « 0; 
prev_vmjndex - base; 
acc„diflbase] ~ 0; 
countj/m - 0; 

keyframe Jag « true; 

} 

else 

{ 

counter**; 

// calculate vector median filter 

accjiiff globj ] - 0; 

pbest = pbestjnit; 

// calculate cumulative distances 

for( to* b a base; b < globj; b++) 

{ 

diftot -0; 

for( inl m = 0; m < HISTLEN; rn++) 
{ 

difval = histfbHmf - his%!obj][ml; 

if(difval>0) 

{ 

diftot +- difval; 
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> 

else 
{ 

diftot-^difvai; 

> 

} 

// Calculate cumulative distance for newest 

// feature vector 

acejfi? 9 fo M I + ~ diftob. 

// Update cumulative distance for tie feature 

// vector in the set 

acc_diff b } +* mot, 

H Keep track of minimum cumulative distance 

if( acc dif[b]< pbest ) 

{ 

vmjndex = b; 
pbest = acc diffb); 

} 

} 

// Keep track of minimum cumulative distance 

if{ accdiff giobj ] < pbest ) 

{ 

vmjndex = gtobj; 

} 

// keep track of how many times the same 

// video ftame has been selected consecutively by the 

//vector median filter, 

prev_countj/rn = eoumjym; 
if( vmjndex -= prev vm index ) 
{ 

count_vm++; 

} 

else 
{ 

count_vm * 0; 

} 

// Calculate threshold value 
// fracjium is numerator and fracdeno is denoinator 
fraejium = seg jnaxien*{1 +seg Jhresh); 
frac_deno - segjr?axien+(seg Jhresh'segjen); 

// Detect a segment boundary and issue a keyframe 
// if minimum run length has been exceeded 
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if( fprev count vm > minjtrnlen) && (count_vm «? Q}} 
{ 

GetDedm(0 ! 0 t wid* ( heightbase l courtL,seg8); 
stomjowseg[count_segs ] - base; 
store Jitseg[ccunt_segs-H-| » gfobJ-1; 

base-giobj; 

seg Jen * G; 

prev jonjndex « base; 

accj&[base] » 0; 

keyframejlag » true; 

// Detect a segment boundary and issue a keyframe 
// if cumulative distance 

// of newest feature vector is larger than the 

// thershold value multiplied by the minimum 

// cumulative distance of the set. 

else if{ (frac_deno*acc_drffglobj]} >= 
(frac num*acc„drf|vm index])} 

{ 

GetDecim(0,0,widtri,height,base,count_segs); 
store Jowseg(count_segs ] ~ base; 
store Jiiseg{count_segs++] ~ gtobj-1; 
base = gbbj; 
segjen * 0; 
prev_vmjndex » base; 
accjdi^base] ~ 0; 
count _ym ~ 0; 

keyframejlag * true; 

} 

// No segment boundary detected; 
{ 

seg_len++; 

keyframejlag * false; 
prev vm index = vm index; 

} 

// Prepare display of the keyframe 

// only MAXJCEYFRAMES are displayed 

//on the screen 

K( keyframejlag ~~ true } 

{ if( ScanKeyFrames » MAXKEYFRAMES ) 
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{ 

// Place new keyframe at beginning 
KeyFramejmgsfQ] - createlmage{ 
tii6KeyFfam©s|count_segs-11 }; 

// and make other placeholders gray 
fori fnt imgindex = 1; imgindex < MAX KEYFRAMES* 



imgindex++) 



{ 

KeyFrameJmgspmgindexJ - theUghtGraylmage; 

Scan Base - countjsegs~1; 
ScanKevFrames - 1; 

) 

else 
i 

// append new keyframe to display list 
Key Frame JmgstScan KeyFrames++] ~ create!mage( 
theKeyFrames(count segs-1]}; 

r 

// Take care of the last keyframe of the last segment 
// in the video sequence 
if( globj img counM } 
{ 

X keyframejiag -= false ) 



GetDecsm^.G.wi'dth.heightbase.count^segs); 
store Jowseg[count_segs } » base; 
sto re_h iseg [co u n t_seg s = glob J; 

keyframe_f!ag = true; 



GetDecjm(0,0,width,height,glcbJ,count_segs); 
store Jowsegfcount__$egs j =* gtabj; 
store Jitseg{count_segs++] - glob J; 



tf( Scan Keyframes MAX__KEYFRAMES ) 
{ 

KeyFrarnejmgs[Q] - create!mage{ 
theKeyFrames[count_segs-1 ] ); 
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fort int imgindex - 1; imgindex < MAX_K£YFRAMES; 

imgindex++) 

{ 

KeyFrameJmgsOmgindex] * thelightGraylmagt 

ScanKeyFrames * 1; 
ScanBase = count segs-1; 

} 

else 

< 

Keyframe Jmgs[$canKeyFrarnes++] = createlmage{ 
theKeyFrames(count_segs-1] ); 
} 

} 

retum( keyframejiag ); 



// Utiirty for loading images 

public boolean imageUpdateflmage img, int flags, int x, int y, int w, int h) 



iff (flags & (ERROR | ABORT)) != 0 ) 
{ 

paint flag = 1 ; 

str - "ERROR IN LOADING"; 
xPos = 20; 
yPos * 10; 




rf{ tmgs_aiNoaded } 
{ 

return false; 

> 

lf{ (flags &AILBIT$) = 0) 
{ 

return true; 

} 

Iff, ++Joaded count == img count } 
{ 

imgsjdijoaded - true; 
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} 

return false; 

} 

// Utility for computing an image histogram 

private void computeHtstcgram{ int 0 pixels, int w, int h, int n) 

{ 

int pixvaf; 
mtr, 
int g; 
int b; 
int lum; 

int scan jndex- 0; 

for( int k - 0; k < HISTLEN; k++) 
{ 

histtnjpc] « 0; 

} 

for {int j*0; ]<h; 
{ 

for (int M); i<w; 
{ 

pixval = ptxeis[scanjndex++j; 
r- (pixvaf » 16) & Oxff; 
g * {ptxvai » 8) & Oxff; 
b » {pixvai ) & Oxff; 
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ium = (int)({0.299T)+(0.587*g)^(0. 1 14*b)); 
hist[nj[brnj++; 

> 

} 

return; 

1 

// Utility for down-sampling an image 

private void computeDedm( int 0 pixels, int w, int h, tnt 0 decim) 
{ 

int scanjndex = 0; 
int decim jndex = 0; 

for (tnt H>; j<h; j+=2) 
{ 

for (int f=0; i<w; t+-2) 
{ 

decim[decimjndex++] = pixeis[scan_indexj; 
scan index 2; 

} 

scan Jndex +- w; 

> 

return; 

) 

private void GetHist(int x, int y, int w, int h, int n) 
{ 

into pixels * new int[w*h]; 

PixelGrabber pg - 

new PixeiGrabbenjmgsfc], x, y, w, h, pixels, 0, w); 

try 

{ 

pg.grabPixelsQ; 

} 

catch (IntenruptedException e) 
{ 

System.en.printinf interrupted waiting for pixels!"}; 
return; 
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} 

if {(pg.staiusQ & tmageObserverABORT) != 0} 

{ 

System.eir.pnntlnClmage fetch aborted or errored"}; 
return; 

} 

computeHtstogram{ptxels,w 1 h,n}; 

} 

private void GetDecim(int x, Int y, int w, int h, int n, int u) 
{ 

into pixels - new inl(w*hj; 

into decim jsix = new int[(w*h)»2]; 

PixefGrabber pg * 

new PixelGrabber(imgs[nj, x, y, w, h, pixels, 0, w); 

try 

i 

pg.grabPixe!s{); 

} 

catch (Interrupted Exception e) 
{ 

System.err.printinC'intemjpted waiting for pixels!"); 
return; 

} 

rf {{pg.statusO & ImageObserver.ABORl} N 0} 
{ 

System, err.phntfnfirriage fetch aborted or errored"}; 
return; 

} 

computeDecimCpbceis.w.b.decimjJix); 
theKeyFrames[u] - new 
Memory lmageSource(w» 1,h»1,decimj5ix,0,w»1); 
return; 

} 

private (mage MakeLighiGrayfmage(!ntw, inth} 
{ 

intd pixels ~ new intfw*h]; 

for (int J=0; j<(w*h); J++) 
{ 
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pixelsOl ~ 0xFFB88888; 

} 

fBtum{ createimage{ new MemorylmageSource(w,h,pixeis,0,w}5 ); 

public image 0 getlmagesQ 
retum{ imgs }; 

pubic image 0 getKeyframesQ 

retum{ KeyFrameJmgs ); 
pubficintgetWidthO 

retum{ width ); 
public int getHeight{) 

reium{ height ); 
public int getKeyframeCount() 

return{ count_segs ); 
public int 0 getSegmenttowBoundariesO 

return{ storejowseg }; 
public tnt 0 getSegmentHighSoundariesQ 

return^ storejiiseg ); 
public boolean 0 getShoiBoundarieslndicatorO 

retum{ fiag_shot }; 
public int getDisplayedKeyframe!ndexBase() 

retum{ ScanBase ); 
public int getSequencelengthQ 

retum( img_count }; 



EP1043664 A2 



TABLE B 

/ ' ~»~»"~« " ~ 
"/ 

r On-line video summarization based on vector median filter **/ 
/** A second , coarser summary is also generated **/ 
r by applying the vector mad ian fitter to the keyframes **/ 
/** belonging to a same shot **/ 

import java.awt*; 
import java.appiet*; 
importjava.net.*; 
import java.awt.image.*; 

public class vm_appiet extends Appiet impiements Runnabte 
C 

static final int OX = 260; 
static final int OY * 0; 
static final int REFRESH = 0; 
static final int LOOP_WAiT * 4999; 
static final int down_off * 145; 
static final int left„off * 0-255; 
static final int HSSTLEN - 256; 

static intxPos = 40; 
static int yPos* 40; 
String numjmages; 
String Q msgs; 
static image 0 imgs; 
static image 0 procjmgs; 
int Smgjcount - 0; 
Graphics cung - null; 
tmageObserver io; 
static int paint Jiag; 
static String sir; 
static int load jndex - 0; 
int width; 
int height; 

static int ioatiedjcount - 0; 

static boolean imgs_ailJoaded » false; 

static int FJ savejcPoa; 

static int Osave_yPos; 

int OQ hist; 
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static int glob J; 

static int Q store Jowseg; 

static int Q storejiiseg; 

static int Q storejndex; 

static int Q accjfiS 

static tnt 0 super_acc; 

irttvmjndex; 

int prev„vmjndex; 

tnt shifix - 0; 

int shifty » 0; 

int oountsegs ■ 0; 

boolean keyframe Jfag; 

static boolean 0 fiagjshGt; 

tnt key„count; 

static tnt pbestjnit; 

int super_base; 

int count_supers; 

static int Q super_vmjndex; 

int die; 

Thread t = null; 

public void initQ{ 

nurnjmages ~ getParameter{"imageCount H ); 
irngjsount - [ntegef.valueOf{numJmages)JntValue{); 
msgs = new Stringpmg_count); 

for< int k » 0; k < img_count; k++) 
{ 

msgs(k] = getParameterpmageV$tring.va!ueOf(k)); 

} 

// Provision memory for various parameters 

Rectangle ctn - getParentQ.bounds{); 
setBackground(Color.lightGray}; 
reske{ctn.width, ctn.height); 
currg = getGraphicsQ; 
imgs * new lmageprng_count]; 
procjmgs - new Jmage0mgjcount}; 
savej<Pos = new int0mg_countj; 
save__yPos * new int[img_countj; 
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hist a new !ntCimg_count]Q; 
store Jowseg ~ new intjlmg^count]; 
storejsiseg - new int{ img^count}; 
acc_dif - new intf img_count ]; 
super_acc ~ new int( img^count ]; 
super j/mjndex - new int| img^count ]; 
store Jndex * new infmg_count]; 
fjag_shot - new boolean progjxiunt}; 

for( int k - 0; k < imgjcount; k-H-) 
{ 

hist[kl*newintp$TLEN]; 

} 

// 256 entries in histogram 

// 256 is max sample value difference 

pbestjnit = img count * HiSTLEN * 256; 

} 

// Utilities for displaying images and text on screen 

// Text (frame number associated with keyframe) is 

// in a yellow box if it is a regular keyframe 

// Text is in a green box if keyframe is also first 

ff keyframe of a video shot 

// Image is surrounded by a blue frame is keyframe is 

// also keyframe at coarse summary level. 

public void paint(Graphics g) 

c 

if( currg == null ) 
{ 

currg =g; 

} 

g,setCo(or{Co!or.red); 

g.setFontf new FontCArial", FontBOLD, 11) ); 

switch( paint flag) 

I 

case 0: 

g.drawString(null,0,0}; 

break; 
case 1: 

g,drawSthng{str 1 xPos,yPQs ); 
break; 
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case 2; 

io ~ null; 

hr{ int idx - 0; idx load Jndex; idx++) 
< 

g.drawirr«ge(in^s(fdx] ( save_xPosOdx},save^PosOdx] 1 Co{or.black,io); 

g.drawString(msgs[idx],save xPos[idxj,save_yPospdx]+120); 
) 

}oad_index+-1; 

break; 
case 3: 

io * null; 

break; 
case 4: 

k> » nui!; 

break; 

case 5: 

io « nufi; 

g.drawfmageCimgsfgiobJl.xPos.yPos.Cotor.bfackjo); 
g.setColorfColoryellow); 
g.fjl!Rect{xPos ( yPos+height,width ( 1 5); 
g.setColor(Caior.red); 

g.dtBwStnng(msgsfe{obJI,xPos,yPos+hetght+10); 
break; 

case 6: 

to ~ null; 

suffix ~ 0; 
shifty - 0; 
key_ccunt * 0; 
int ss « 0; 

for{ int kf - 0; kf < count„segs; tcf++) 
{ 

g.drawimagelprocJmgslkfl.xPos+ieft^ff+shtftXjy 
Pos+shifty+down„off,Co!or, biack,io); 

str = " #" + String»vaIueOf£ store JndexfkfJ ); 
str+- B of ("; 

str += String.va!ueOf{storeJowseg[kfI); 
str += 

str String.vaiueOf(store_hfseg[kfl); 
str+=T; 
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$ *» true) 

{ 

g.setCok3r(Cofor.green); 

} 

else 
{ 

g.$e!Cofor(Cotar.yeiiow); 

} 

g.setCo!or(Cobr.red}; 
g .drawString(str,xPos^eft_off+shifbt > yPos+shi%^own_off+(height>>1 }+ 1 0} ; 

tf( (ss < countjsupers) && (kf ** 

superj/rnjndexfes}) } 

{ 

g.se!Coior(Cotor.bfue); 
g.fj!IRect(xPos+!eft_off+shffbc,yPos+down__off+15+shffty+(height» 1 ),(wWth»1 },2); 
g.ffilRect(xPo$+ien„off+shiftx,yPas+dow^ 

gJ!1Rect(xPas+!eft off+sh{ftx-2,yPos+down_off+shjfty-2,2,{height» 1 }+ 1 5+4); 

gJi!Rect(xPos*iefl_off+shffb(+{width»1),y 
Pos+down„off+sh(fty-2,2,{height»1)+15+4); 

ss++; 

} 



shiftx {width 10; 
if{ ++key_count >~ 7 ) 
{ 

key_count * 0; 
shiftx - 0; 

shifty +={(height»1}+25); 

} 



} 

break; 
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default: 

} 

} 

public void startf) 
{ 

Ht t — null } 
i 

t ~ new Thread(this); 

tstartO; 

) 

} 

public void runQ 
{ 

tnt diftot; 

int difval; 

tnt base; 

double mu ; 

double sigma; 

int counter; 

double alpha a 4.0; 

double minShotLength « 10; 

double currentMean; 

int prev; 

double threshold; 

int pbest; 

int count_ym; 

int prev_count_vm; 

// Maximum segment length 

// At least one keyframe will be generated 

// for that many input video frames 

int segjnaxlen * 45; 

// Initial value for calculating time-varying 

//threshold. 

int segjhresh - 17; 

// Minimum number of times the same frame 
//must be consecutively selected by 
// median fitter to become a keyframe 
int minjunlen - 7; 
int segjen; 
double fracjiurn; 
double frac_deno; 
int superjjbest 
int index; 
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// get references to images from HTML document 
// publishing iris applet 
paintjiag = 1; 

fbrf mi k -0; k < lmg_coun$ k++) 
{ 

try 

{ 

imgs(k] * ToolkitgetDefauitToolkit0.getimage( new URL( 
getDocumentBaseO, msgsfk] )); 

str « "Getting Reference to image 
str += String. vaiueOf{k); 
repaintO; 

// Thread xurrentThreadQ.sieep(1 00); 

} 

catch{ Exception e } 

{ 

} 

} 

paintjiag = 0; 
repaintO; 

for{ kit k - 0; k < img_coimt; k-n>) 
{ 

So * this; 

currg.draw!mage(irngs[k],«1000,-1000,Cotor.biack,io); 

try 

{ 

Thread.sleep( 100 ); 

} 

catch{ Exception e ) 

{ 

) 

} 

whilef !imgs_ailJoaded ) 



try 

{ 



str - loading images... 
pairtjag * 1; 
xPos * 40; 
yPos - 40; 
repaintO; 

// Thfead.sieep{10}; 
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catch{ Exception e) 

{ 

} 



whie( true ) 
{ 

ay 

{ 

str = "7.4925 Hz Progressive 180x120 Video"; 

paint flag* 1; 

xPos = OX+220; 

yPos - OY + 80; 

repaintO; 

//Tfcread.sieep(200G}; 

} 

catch{ Exception e } 

{ 

} 

for( im k = 0; k < img_count; k++) 
i 

acc„dif|k] ■ 0; 
store Jndex[k] » 0; 
fiag_shot[kJ« false; 

} 

mu * 0.0; 
sigma - 0.0; 
counter = 1; 
prev - 0; 

prev_count_vm = 0; 
oountjffn ~~0; 
countjsegs * 0; 
seg Jen - 0; 
vmjndex - 0; 
superjsase « 0; 
count_supers » 0; 

// Start video summarization 
base = 0; 

for{ glob J - 0; gfobj < img count g!obJ++) 
{ 

fry 

{ 
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width - 0; 

whsle{ {width = imgs(glob J.getWldth{nui{}) ~- 0} 
//Thread.sleep(l); 

} 

height » 0; 

while{ (height = imgs(globJ.getHeight(nui!)) « 0) 
# Thread.steep(l); 

} 

// Compute video frame histogram 
GetHist(0 1 0 ) width,height > giobJ}; 

// initialization for coarse summary 

iff globj 0 ) 

{ 

xPos = OX; 
yPos = OY; 
paintjiag - 5; 
paint(currg); 

Thread.sieep( REFRESH ); 
prevj/mjndex - 0; 
fJag_shotfQ] » true; 

super_base - 0; 

count_supers = 0; 

for( int z = 0; z < img_count; z++) 

{ 

super_acc(zj - 0; 

} 

continue; 

> 

// For shot boundary detection 
dtftot * 0; 

for{ int m - 0; m < HISTLEN; m++) 

difvai - Ms%iob jlroj - histfgiob J-1 3[m|; 

if{ difvai > 0 ) 

{ 

diftot difvai; 

} 
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else 
{ 

diftot —difvai; 

} 

} 

mu +* diftot; 

sigma +~ (dtftot * diftot); 

currentMean = mu / (double) counter; 

threshold - currentMean; 

iff counter > 1 ) 

{ 

threshold +- (alpha * Math.sqrt{ (sigma - 
((doub!e)counter*currertMe3n*cun'entMean)} / (double)(counter-1) )); 
} 

H Shot boundary has been detected if 

// following test is true 

tf{ (diftot > threshold) && (globj-prev > 

mmShotLength) ) 

i 

prev ~ glob J; 
mu - 0.0; 
sigma = 0.0; 
counter = 0; 

fiag_shot[count_segs+1] - true; 



GetDeclm(O l O,width,height,prev_ymJndex,count_segs); 

storejowseg[count_segs ] - base; 
store Jiisegfcount^segs] « gtab J-1 ; 
storeJndex[count_segsj - prevj/mjndex; 
base = glob J; 
segjen - 0; 
prevjmjndex = base; 
acc„dif|base3 = 0; 
count_vm = 0; 

//dic-1; 

keyframejlag ~ true; 

// Since shot boundary has been found, 
// it is time to calculate most representative 
// keyframe of all keyframes in the same shot 
// The resulting keyframe is the keyframe 
// that will appear in the coarser summary 
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htstCprev_vmJndex][m]; 



distance 



// (one keyframe per video shot) 

// The vector median filter is applied to 

// ail keyframes in the shot 

// First calculate cumulative distances 
for( intz=super„base; z < count_segs; 



index - store indexfz]; 
diftot = C; 

for( int m - 0; m < HiSTLEN; m++) 
{ 

difvai - histjtindex)[m] - 

H< difvaf > D ) 
{ 

diftot +~ difvai; 

} 

eise 
{ 

diftot -* difvai; 

} 

} 

super^accf count_segs ] +* diftot; 
super acc[ z J +- diftot; 

> 

// Then identify minimum cumulative 
super jjbest ~ pbestjnit; 
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for{ int z - super.base; z <= count_segs; 
C 

tf( super_acc£ z ] < super „pbest ) 



super jmijndex[count_supers] * z; 



} 



super_pb6st = super_acc{ z ]; 



} 

else 
{ 



} 

super Jiase - ccunt_segs; 

counisupers++; 

count_segs++; 



// No shot boundary has been detected 

counter*-*-; 



hist[gfobj][m}; 



// calculate output of vector median 

// filter for current segment 

acc_dtf[ glob j ] = 0; 

pbest = pbestjnit; 

for( int b = base; b < gfobj; b++) 

{ 

diftot « 0; 

for( int m - 0; m < HfSTLEN; m++) 



{ 



dsfva! = histfblfmj - 

rf( difvat > 0 ) 
{ 



} 



} 

else 
{ 

} 



diftot drfval; 



// Update cumulative distances 

acc„difCgfobJ 3 +- diftot; 
accjdif b ] +- diftot; 
// Keep track of minimal cumulative distance 
rf(acc„dif[bl<pbest) 
{ 
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vmjndex - b; 
pbest - acejJifJ b J; 

1 

// Keep track of minimum cumulative 

distance 

if( accdiff glob J ] < pbest } 
{ 

vmjndex « gfobj; 

} 

// Keep track of number of times 
// same keyframe has been selected 
//consecutively 

prev_count„vm = count j/m; 
if{ vmjndex == prev_vm index ) 
{ 

count vm++; 

} 

else 
{ 

count vm - 0; 

} 

// Calculate threshold numerator and 
//denominator 

fracjium - segjnax!en*(1 ,0+segJhresh); 
fracjieno * 

seg_maxlen+(segjhresh*segjen); 

if{ {prev_countj/m > minjunten) && 

(count vm 0)) 

{ 

// Issue new keyframe as mimmum 
// run length for vector median output 
//has been satisfied 

GetDedm^O.O^dth^eighiprev^vmjndex^un^segs); 

store jowseg[count_segs ] - base; 
store Jiisegfcount_segs] = gfobJ-1 ; 
storelndex|count_segsj = 

prevjrmjndex; 

base = g!obJ; 
segjen ~ 0; 
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count_segs; z++) 
m++) 

hist[prev_vmjndexfrn]; 



diftot; 



prevj/mjndex * base; 
acc_dHfbase] - 0; 

keyframe Jag * true; 

// Update cumuiative distances 
// among keyframes of the current 
//shot (for coarse summary) 

fer( int z - superjiase; z < 
{ 

index - store jndex[z]; 
drflot = 0; 

for(fntm = 0;m<HrSTLEN; 
{ 

difval - htst[tndex][m] - 

if{ difval > 0 } 
{ 

dfftot +« drfval; 

} 

eise 
{ 

diftot-* difval; 

> 

super_acc{ counfjsegs ] += 
super_acct z ] += diftot; 



} 

count„segs++; 



{frac_num*acc_dif|vmjndex3} ) 



} 

e!sei«((fracjieno*acc_dir|g}objl) >- 
{ 

// Cumulative distance is greater 
//than threshold multiplied by 
// mmmal cumlative distance; 
// Time to generate a keyframe 

GetDecim(G Awsdth ,hejght,prev_vm ijndex.countjsegs); 

store Jowseg{cQunt_segs ] ~ base; 
storejtisegfcounisegs] * gtobJ-1; 
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store Jndex{ccunt_segs] ~ 

prevvmjndex; 

base « glob J; 
segjen*0; 
prev^vmjndex « base; 
3cc_.diflbase] - 0; 
cauntvm - 0; 

keyframe Jtag * true; 

// Update cumulative distance 
// among keyrames of current 
// shot {for coarse summary} 
for{ int 2 = sopeMbase; z < 

count_segs; z++) 

{ 

index = storejndexfz]; 
diftot=0; 

for(intm«0;m<HiSTLEN; 

m++) 

{ 

djfva! = htstpndexl[m] - 

histfprev vmjndexp]; 

~ ~ " if{dirVai>0) 

{ 

drftot +- difval; 

} 

else 
{ 

diftot -= drrval; 

} 

> 

super_accf count_segs ] += 

diftot; 

super acc[z] +-dtftot; 

} 

count segs-H-; 

} 

e?se 

il No keyframe is generated 
segjen++; 

keyframejfag = false; 
pfBvj/mJndex = vmjndex; 
Thread,sTeep(100); 
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} 

} 

xPos*OX; 
yPos « OY; 
paintjRag * 5; 
paint{currg); 

rf{ keyframe Jlag true } 
{ 

paintjfag -8; 
p3!nt(currg); 

} 

Thread.sleep< REFRESH ); 

} 

catch { Exception e } 

{ 

} 

} 

// Take care of last segment in video sequence 

if( keyframe flag — false ) 

( 

GetDectn^O.O.width.hetght.vmjndex, count__segs); 
store Jowsegfcount_segs } * base; 
store__hiseg[count_segs] - giobJ-1; 
store jndex[count_segs} - vmjndex; 

} 

else 
{ 

GetDecim^.O.width.heightglcbJ-l.count^segs); 
storejowseg[count_segs ] - gicbj-1 ; 
stereIhiseg[count_segs] = gtobj-1; 
store_index|count segs] = giobJ>1; 

> 

// Take care of coarse summary for last shot 

// m the video sequence 

for( Snt z = super_base; z < countjsegs; z++) 

{ 

index ~ store index[z}; 
diftot = 0; 

far( int m » 0; m < HiSTlEN; m++) 

i 
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difvaf - histpndex][mj - hist[vm mdexflmfc 

if(d!fval>0) 

I 

diftot-*-* drfvai; 

} 

else 
{ 

d»ot-*difval; 

1 

} 

super_acc{ count jsegs ] +* dtftot 
superjacc{z}+=diftot; 

5 

super_pbest - pbestjnit; 

fer( ini z - superjbase; z <= count„segs; z++) 

{ 

if{ super_acc£ z ] < super jjbest ) 
{ 

super _vmjndexfcount_supersl - z; 
superj3best = super_acc[ z J; 

} 

} 

count_supers++; 
count_segs++; 

try 
{ 

xPos = OX; 
yPos-OY; 
paintjag » 6; 
paint(currg); 

Thread.s!eep(LOOP_WA!T); 

1 

cateh{ Exception e) 

{ 

} 

try 
{ 

paint Jfag = 0; 
patrst( currg ); 
//Thread.s!eep£ 10); 

} 

eateh{ Exception e } 
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{ 
} 

} 

} 

It Utility for loading the images 

public boolean imageUpdate(image irog, int flags, int x, int y, int w, int h) 

( 

if{ (flags a (ERROR J ABORT)) != 0 } 
{ 

paint flag = 1; 

str« "ERROR IN LOADING": 
xPm - 20; 
yPos- 10; 
repaintQ; 

) 

if{ imgs all loaded } 
{ 

return false; 

} 

if( (flags &ALlBlTS) — 0) 
{ 

return true; 

) 

ff( ++Soaded_count *» j m g count ) 
{ 

fangs ai! loaded * true; 

> 

return false; 

} 

// Utility for computing an image histogram 

public void computeHtstogram{ int Q pixels, int w, int h, int n) 

{ 

int pixvaS; 
intr; 
intg; 
int b; 
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int Sum; 

tni scan Jndex * 0; 

fcr( int k = 0; k < HISTLEN; k++) 

{ 

hfst(nl[kj « 0; 

} 

for (int p0; }<k; ]++) 
{ 

for (int fO; t<w; 
i 

II pixval - pixe!s[{j*w)+t]; 
pixval - p'Kels[scanjrtdex++}; 
r = (pixval » 16) & Oxff; 
g = {pixval » 8) & Oxff, 
b - (pixval } & Oxff; 

lum = {?nt)({0,299*r)+(0.58rg)+{0.1 14*b}); 

} 

} 

return; 

} 

// Utility to compute a down-sampled version of 
// an image. This is used to display the keyframes 
//on the screen. 

public void computeDecim( int Q pixels, int w, int h, Int Q decim) 
{ 

int scanjndex ~ 0; 
Int decirn jndex - 0; 

for (int j«0; j<h; j+=2) 
{ 

for (int NO; i<w; R*2) 
{ 

dedm[dedmjndex++} * pixeisfccanjndexj; 
scan index +=2; 

} 

scanjndex += w; 

} 

return; 
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> 

public void GetHfetfjrrt x, int y, int w, int h, int n) 
{ 

Intj] pixels - new int£w*h]; 

PixeiGrabber pg - 

new PixeiGrabbef(imgs|hJ, x, y, w, h, pixels, 0, w); 

try 

( 

pg.grabPixeisO; 

} 

catch (IntemiptedExcepfion e) 
{ 

System.en\printlnfinterrupted waiting for pixels;"); 
return; 

} 

if ({pg.statusO & lmageObserver.ABORT) 0} 
{ 

System. err.pnntlnfimage fetch aborted or errored* 1 ); 
return; 

} 

computeHtstogramfcixeis.w.M}; 

} 

public void GetOecim(ir,t x, int y, int w, int h, int n, int u) 
{ 

irttfj pixels = new intfw'h]; 

intO decim_pix = new int({w*h}»2}; 

PixelGrabber pg = 

new PixelGrabber(imgs[nJ, x ( y, w, h, pixels, 0, w); 

try 

{ 

pg.grabPixeis0; 

catch (InterruptedException e) 
{ 

System.err.printlnfinterrupted waiting for pixels!"); 
return; 

} 

if ({pg.statusO & !mageObserver.A80RT) !~ 0) 
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{ 

System.err.printlnfimage fetch aborted or errored"}; 
return; 

} 

computeDedm(pbcefs,w 1 h,decim jjtx); 
procJmgs(uj « createimage{new 
MemoryimageSource{w»1 ,h»1 ,dedmj3ix,0,w»1)); 
} 

public boolean mouseDown(Event evt, int x, int y) 
{ 

tf(foad index <img count) 
{ 

savejcPos{toad Jndex] = x; 
save_yPosjloadjndexj ~ y; 

paint Jag » 2; 
repaintO; 

} 

else 
{ 

paint Jag » 2; 

toad Jndex = img_count-1; 

repaintO; 

} 

return true; 

} 



1 , A method of creating a summary of a plurality of video frames comprising the steps of; 

{a} receiving a first fame; 

(b) receiving a subsequent frame; 

(c; resolving a homogeneity of content of said first frame and each said subsequent frame received; 

(d) identifying at teas! one key frame from said first frame and each said subsequent frame received having 
content most homogeneous with that of said first frame and each said subsequent frame; 

(e) comparing said homogeneity of said ftrs! frame and each said subsequent frame with a threshold homoge- 
neity; and 

■If) epeating steps (b) through (e) until said homogeneity has some predefined relationship with said threshold 
homogeneity, 

2. The method of claim 1 wherein said homogeneity of content of said frames is resolved with a vector rank filter (70) , 
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(d) including said first feature vector in a set of segment feature vectors; 

(e) defining a next feature vector representative of a subsequent video frame; 

(f) including said next feature vector with said mi of segment feature vectors; 

(g) calculating a cumulative distance from said next feature vector to each said feature vector in said set of seg- 
s ment feature vectors; 

(h) updating said cumulative distance; 

(0 comparing said updated cumulative distance with said threshold value; and 

0) repeating steps {e} through (i) until said updated cumuiative distance has a predefined relationship to said 
threshold value thereby defining said second boundary. 

w 

15. The method of claim 14 wherein a minimum of said cumulative distance is determined with a vector median filter 
(12). 

16. The method of claim 15 wherein said threshold value has a predefined relationship to said minimum cumulative dis- 
w tance of said set of segment feature vectors, 

17. The method of claim 13 orl 6 wherein said predefined relationship is a function of a noise power of an image in a 
video frame. 

so 18, The method of claim 13 or 16 wherein said predefined relationship is a function of a compression ratio to be 
achieved by a summary of said plurality of video frames. 

19, The method of claim 11 or 14 wherein said first feature vector and said next feature vector are vector signals 
describing an image histogram. 

20, A method of creating a summary of a plurality of video frames comprising the steps of: 

(a) defining a threshoid distortion; 

(b) locating a first frame in a segment of video frames having relatively homogenous content; 
ao (c) defining a first feature vector representative of said firstframe; 

(d) including said first feature vector in a set of segment feature vectors; 

(e) defining a next feature vector representative of a subsequent video frame; 

(f) including said next feature vector with said set of segment feature vectors; 

(g) calculating the distortion of said set of segment feature vectors resulting from the inclusion of said next fea* 
JB ture vector with said set of segment feature vectors; 

fh) comparing said distortion of said set of segment feature vectors with said threshold distortion; 

(i) repeating steps (e) through (h) until said distortion of said set of segment feature vectors has a predefined 
reiationship to said threshold distortion thereby defining a second boundary of said video segment; 

(j) identifying a key feature vector that minimizes said distortion of said set of segment feature vectors; and 
40 (k) identifying said video frame corresponding to said key feature vector as a key video frame. 

21, The method of claim 20 wherein said distortion of said set of segment feature vectors is determined with a vector 
rank filter (70). 

45 22. The method of claim 20 wherein said threshold distortion has a predefined relationship to a minimum distortion of 
said set of segment feature vectors. 

23, The method of claim 22 wherein said distortion of said set of segment feature vectors is determined with a vector 
rank filter (70). 

so 

24, A method of creating a summary of a plurality of video frames comprising the steps of: 

(a) defining a threshold value, 

(b) locating a first frame in a segment of video frames having relatively homogenous content; 
55 ;c) defining a first feature vector representative of said ftrstframe; 

(d) including said first feature vector in a set of segment feature vectors; 

■;e) defining a next feature vector representative of a subsequent video frame; 

(f) including said next feature vector with said set of segment feature vectors; 
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(g) calculating a cumulative distance from said next feature vector to each said feature vector in said set o! seg- 
ment feature vectors: 
(ft) updating said cumulative distance; 

;i) comparing said updated cumulative distance with said threshold vaiue; 
s fj) repeating steps (e) through (i) until said updated cumulative distance has a predefined relationship to said 

threshold vaiue thereby defining said second boundary of said video segment; 
(k) identifying a key feature vector that minimizes said cumulative distance; and 
(I) identifying said video frame corresponding to said key feature vector as a key video frame. 

io 25. The method of claim 24 wherein a minimum of said cumulative distance is determined with a vector median tier 
(12). 

26. The method of claim 24 wherein said threshold value has a predefined relationship to said minimum cumulative dis- 
tance of said set of segment feature vectors. 

15 

27. The method of ciaim 26 wherein said minimum cumulative distance is determined with a vector rank filter (70). 

28. The method of claim 20 or 24 wherein said first feature vector and said next feature vector are vector signals 
describing an image histogram. 

so 

29. A method of creating a summary of a plurality of video frames comprising the steps of: 

(a) dividing said plurality of video frames into at least one video segment of relatively homogeneous content 
comprising at least one said video frame; 

ss p) defining feature vectors representative of each of said video frames; 

(c) ranking said feature vectors representing said video frames included in said video segment according to a 
distortion produced in a set of said feature vectors representing said segment by each of said feaiure vectors 
in said set of feature vectors; and 

(d) including in said summary said video frames represented by said feature vectors ranked as producing a 
minimum of said distortion. 

30. The method of ciaim 29 wherein said relative distortion is determined with a vector rankfilter (70). 

31. The method of daim 29furiher comprising the step of including in said summary said video frames represented by 
35 said feature vectors ranked as producing relative distortion greater than said minimum of said relative distortion. 

32. The method of claim 29 wherein the step of dividing said plurality of video frames into at least one video segment 
comprises the steps of: 

m (a) def ining a threshold distortion; 

(b) locating a first frame in said video segment; 

(c) defining a first feature vector representative of said first frame; 

(d) including said first feature vector in a set of segment feature vectors; 

(e) defining a next feature vector representative of a subsequent video frame; 
45 (f) including said next feature vector with said set of segment feature vectors; 

(g) calculating a distortion of said set of segment feature vectors resulting from including said next feature vec- 
tor with said set of segment feature vectors; 

(h) comparing said distortion of said set of segment feature vectors with said threshold distortion; and 

(i) repeating steps (e) through (h) until said distortion of said set of segment feature vectors has a predefined 
so relationship to said threshold distortion thereby defining a second boundary of said segment 

S3. The method of claim 32 wherein said distortion of said set of segment feature vectors is determined with a vector 
rankfilter (70). 

34. The method of ciaim 32 wherein said threshold distortion has a predefined relationship to a minimum of said dis- 
tortion. 

35, A method of creating a summary of a plurality of video frames comprising the steps of; 
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45. The apparatus (80) of claim 43 or 44 wherein said vector signals characterize image histograms of said frames. 

46. An apparatus (SO) for identifying a last frame in a segment of video frames of relatively homogeneous content 
within a plurality of video tames (82) comprising: 

s 

fa) a receiver (84) to receive said plurality of frames (8£); 

(b) a vector converter (86} to characterize each of said plurality of frames {82} as a vector signal; 
£c) an accumulator (88) to accumulate a set of said vector signals comprising ssiti vector signal corresponding 
to a first frame in said segment and successively thereafter said vector signal conespcoding to at least one 
10 subsequent frame; 

(d) a vector filter {90} to calculate a distortion of said set of vector signals as each said vector signal is accu- 
mulated with said set of vector signals; and 

(e) a comparator (92) to identify said frame (82) corresponding to said vector signal producing said distortion 
of said set of vector signals having a predefined relationship to a specified threshold distortion thereby idersti- 

?s tying said last frame in said segment. 

47. The method of claim 46 wherein said threshold distortion has a predefined relationship to a minimum of said dis- 
tortion of sard set of vector signals. 

so 48. An apparatus (80) for identifying a last frame in a segment of video frames of relatively homogeneous content 
within a plurality of video frames (82) comprising: 

(a) a receiver (84) to receive said plurality of frames (82); 

(b) a vector converter {86} to characterize each of said plurality of frames {82} as a vector signal: 

ss (c) an accumulator (88) to accumulate a set of said vector signals comprising said vector signal corresponding 

to a first frame in said segment and successively thereafter said vector signal corresponding to ai least one 
subsequent frame; 

(d) a vector filter (90) to calculate a relative cumulative distance from said vector signal to each other said vec- 
tor signal of said set of vector signals as each said vector signal is accumulated with said set of vector signals; 

so and 

(e) a comparator {92} to identify said frame corresponding to said vector signal characterized by said relative 
cumulative distance having a predefined relationship to a specified threshold value thereby identifying said last 
frame in said segment. 

ss 40. The method of clai m 48 wherein said threshold value has a predefined relationship to a minimum of said cumulative 
distance, 

50. An apparatus (80) for producing a summary of a plurality of video frames (82) comprising; 

40 (a) a receiver (84) to receive said plurality of frames {82}; 

fb) a vector converter (86) to characterize each of said plurality of frames (82) as a vector signal, 

(c) an accumulator (88) to accumulate a set of said vector signals comprising said vector signal corresponding 
to a first frame in a segment of frames having relatively homogeneous content and successively thereafter said 
vector signal corresponding to at least one subsequent frame; 

4$ (d) a vector filter (30} to calculate a distortion in said set of vector signals as each said vector signal is accu- 

mulated in said set of vector signals and rank each said vector signal relative to each other said vector signal 
in said set of vector signals according to a relative measure of said distortion; 

(e) a comparator {92} to identify said vector signal producing said distortion in said set of vector signals having 
a predefined relationship to a specified threshold distortion thereby identifying a last frame in said segment of 

so frames (82) of relatively homogeneous content; and 

(f) a skimmer (94) to identity each said frame (82} in said segment corresponding to each of said vector signal 
producing said distortion of a specif ied relative rank. 

51. The apparatus (80) of claim 50 wherein said threshold distortion has a predefined relationship to a minimum dis- 
ss torfion of said set of vector signals. 



52. An apparatus {80} for producing a summary of a phi' ality of video frames (82) comprising: 
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